This assignment is for ETC5521 Assignment 2 based on Team emu comprising of Justin Thomas and Mayunk Bharadwaj. and revised by Abhishek Sinha and Yiwen Zhang.

1 Introduction and Motivation

Using the data provided on the ‘tidytuesday’ platform, our primary question is to identify the characteristics of a winning beach volleyball team for both males and females.

We believe that there might be differences in characteristics for a winning team compared to a losing team because of, for example, prevalence of beach volleyball in certain countries. Also, we theorize that taller and younger players may potentially be better at beach volleyball because of the competitive advantage they may have over shorter and more seasoned players.

Therefore, the secondary questions that will help us answer our primary question are:

Furthermore, We will further explore the individual qualities of individual players in the team to identify the most successful player and the most successful combination. In addition to the physiological quality (height, age) and technical factors, we will also study whether the winning team will be affected by the home advantage.

After studying the characteristics of the winning team, we will also be very curious about an interesting question. Although the winning team is likely to be a strong team (high ranking ), is there any situation that the low ranking team defeats the high ranking team? So, we add four additional questions to complete this analysis:

In the following report, the reader will be able to find a description and information about the source and limitations of the data; information on how the data was cleaned; an analysis that will answer the above questions and a conclusion.

1.1 Limitations of Data Analysis

While going through the dataset, we found that the data was incomplete because there were multiple ‘NA’ values for individual player performance statistics. As such, observations which featured ‘NA’ values had to be removed as they were unlikely to be helpful in our analysis. Due to this, the sample size will be reduced, which means that the accuracy of the research results may be affected to a certain extent.

2 Data description

2.1 Questions

Primary Question

What are the characteristics of a winning beach volleyball team for both males and females?

Secondary Questions

  • Which countries have the most winning players?
  • What is the average age of players on a winning team and how does it compare with a losing team?
  • What is the average height of players on a winning team and how does it compare with a losing team?

Additional Questions for Assignment 2

  • Looking into the FVIB circuit, is there any home advantage for winning players?

  • Is there any low ranking team beat higher ranking team?

  • What combination of the players are most successful and have teamed up for the greatest number of matches in both the volleyball circuits?

  • Who are the most successful players in beach volleyball and how they have evolved over time and their skills pattern?

2.2 Explanation of data being used

This data set provides beach volleyball statistics for men’s and women’s matches at two major tournaments, the Fédération Internationale de Volleyball (FIVB) Beach Volleyball World Championships and the Association of Volleyball Professionals (AVP) tour. The matches are played with teams of 2. In this data set, tournament information, player information, player performance statistics and match results are recorded. The data provided ranges from September 2000 to August 2019 and it has been collected by the data recorded at the tournaments.

The original data source created by Adam Vagner had initial data recorded from September 2000 to July 2017, however it has been periodically updated with the most recent update coming in May 2020. This can be found at this website on Github.(BigTimeStats, n.d.)

The structure of the data set is:

  • Rows: 76756
  • Columns: 65
  • Data types: Character, Numeric, Data and Difftime

There are 65 variables in this data set:

Variable Name
circuit
tournament
country
year
date
gender
match_num
w_player1
w_p1_birthdate
w_p1_age
w_p1_hgt
w_p1_country
w_player2
w_p2_birthdate
w_p2_age
w_p2_hgt
w_p2_country
w_rank
l_player1
l_p1_birthdate
l_p1_age
l_p1_hgt
l_p1_country
l_player2
l_p2_birthdate
l_p2_age
l_p2_hgt
l_p2_country
l_rank
score
duration
bracket
round
w_p1_tot_attacks
w_p1_tot_kills
w_p1_tot_errors
w_p1_tot_hitpct
w_p1_tot_aces
w_p1_tot_serve_errors
w_p1_tot_blocks
w_p1_tot_digs
w_p2_tot_attacks
w_p2_tot_kills
w_p2_tot_errors
w_p2_tot_hitpct
w_p2_tot_aces
w_p2_tot_serve_errors
w_p2_tot_blocks
w_p2_tot_digs
l_p1_tot_attacks
l_p1_tot_kills
l_p1_tot_errors
l_p1_tot_hitpct
l_p1_tot_aces
l_p1_tot_serve_errors
l_p1_tot_blocks
l_p1_tot_digs
l_p2_tot_attacks
l_p2_tot_kills
l_p2_tot_errors
l_p2_tot_hitpct
l_p2_tot_aces
l_p2_tot_serve_errors
l_p2_tot_blocks
l_p2_tot_digs

2.3 Data Cleaning

Our data was already in tidy format, so we did not have much cleaning to do. However in order to conduct our analysis, we have tidied the data set by removing variables that are not pertinent to answer our questions.

The methods we have used to tidy our data is as follows:

  • We deselected some variables from appearing in the data set and overwrote the original data set with the new tidied data set.

The reason for why we did not include variables such as match duration, or individual player performance statistics was because it did not fit with answering the questions we have laid out. Additionally, majority of the data for these variables were unknown, so it would not have been useful in our analysis.

2.4 Description of variables in data set as organised in tidy form

Variable Description
circuit Either AVP (USA) or FIVB (International)
country Country where tournament played
year Year of tournament
date Date of match
gender Gender of team
w_player1 Winner player 1 Name
w_p1_birthdate Winner player 1 birth date
w_p1_age Winner player 1 age
w_p1_hgt Winner player 1 height in inches
w_p1_country Winner player country
w_player2 Winner player 2 name
w_p2_birthdate Winner player 2 birth date
w_p2_age Winner player 2 age
w_p2_hgt Winner player 2 height in inches
w_p2_country Winner player 2 country
l_player1 Losing player 1 name
l_p1_birthdate Losing player 1 birth date
l_p1_age Losing player 1 age
l_p1_hgt Losing player 1 height in inches
l_p1_country Losing player 1 country
l_player2 Losing player 2 name
l_p2_birthdate Losing player 2 birth date
l_p2_age Losing player 2 age
l_p2_hgt Losing player 2 height in inches
l_p2_country Losing player 2 country
score Match score separated by a dash and matches separated by a comma, eg 21 points to 12 points is 21-12

2.5 Data sources

The original data is sourced from: Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball

To load the data set, we had to use a GitHub repository that had the data set. The name of this repository is “Tidy Tuesday”. The data set was sourced from this repository: Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md

3 Analysis and findings

3.1 Which countries have the most winning players?

For both the AVP and FIVB tournaments, a team consists of 2 players. Each player in the team either comes from the same country or they can come from different countries. Thus, in this section, our analysis focuses on finding the countries that had the most number of winning teams. This will help us find the countries that had the most winning players.

In order to find our answer to this question, we first did some data wrangling to get the data set up for analysis. Then we followed the steps outlined below:

  • Firstly, find the distinct players in the variables w_p1_country and w_p2_country. This is to ensure we don’t get multiple rows of the same team with the same combination of player 1 and player 2.
  • After getting distinct players, we grouped the players by their respective countries.
  • Following this, we use the tally() function to count up the total number of teams by countries and dropped any rows that had no values in them.
  • This gave us a list of all the participating countries with a total count of winning teams per country.
  • We rearranged the data to show the total count in descending order and we also renamed the variables to make it more meaningful.
  • Lastly, we saved this as a new data set called “country” to be used later on.

Figure 3.1: Top 20 countries with the most winning teams

Figure 3.1 shows the top 20 countries with the most number of winning teams. We can see that the United States was the most dominating country with a total of 4200 winning teams. This means that at minimum 8400 players came from the United States and won. In distant second place, Brazil had 258 winning teams, and so 516 Brazilian players won matches where both players in the team came from Brazil. In a close third place, Germany triumphed with 200 winning teams comprising of 500 players. The remaining 17 teams in this plot ranged from having 166 winning teams to 45 winning teams.

The clear winner here is United States and we can conclude that majority of the winning players in the AVP and FIVB tournaments hail from the United States.

We decided to dig further into United States. Although there were 4200 teams where both players in each team came from the United States, there were instances were 1 player came from the United States and another player came from a different country. This following section takes a look at the different countries that partnered with the United States.

In order to find the different countries that partnered with the United States, we followed the steps outlined below:

  • First, we filtered the rows from the “country” data set so that values from “Player 1 country” variable are “United States” or values from “Player 2 country” variable are “United States”.

This gave us a list of all the different country combinations where either player 1 or player 2 came from the United States and the other non-USA player’s country.

Table 3.1: Different combinations of countries that partnered with the United States
Player 1 country Player 2 country Number of teams
United States United States 936
United States Brazil 31
Brazil United States 16
United States Australia 16
Canada United States 11
Philippines United States 8
United States England 7
United States Poland 7
United States Azerbaijan 6
Australia United States 5
England United States 5
United States Puerto Rico 5
Poland United States 4
United States Israel 4
Italy United States 3
New Zealand United States 3
Puerto Rico United States 3
United States Philippines 3
Israel United States 2
Russia United States 2

Table 3.1 shows 20 different country combinations, which is only a subset of the different countries that partnered with the United States. In total there were 66 different combinations.

Apart from both players coming from the United States, 44 different teams had player 1 come from the United States and player 2 come from Brazil. 34 teams had player 1 come from Poland and player 2 come from the United States.

From looking at the rest of the table, we can see just how popular the United States is as a competing country in volleyball tournaments. It not only registers in tournaments where both players come from the United States, but it also registers where only 1 player in the team comes from the United States and partners with a player from a different country.

3.2 What is the average age of a winning team and how does it compare with a losing team?

N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.

The average age for male winning players 1 and 2 are 29.40 and 29.32 respectively. The average age for male losing players 1 and 2, on the other hand, are 29.08 and 28.95 respectively. There is no obvious bias to winning and losing due to age - as the average age for losers and winners is about the same.

This might tell us something, however, about the average age of participation in professional male volleyball. If we plot every age of, for instance, male winning player 1 (Figure 3.2) and male losing player 2 (Figure 3.3) as examples, we see that the most commonly occurring ages are in the late 20s (28-29 year of age). Therefore, it is reasonable to infer that male volleyball players - due to the high levels of participation at those ages – hit their peak in their late 20s.

Now, let’s consider women’s volleyball. The average age for female winning players 1 and 2 are 27.98 and 28.29 respectively. The average age for female losing players 1 and 2 are 27.52 and 27.73 respectively. As was the case with the male game, age does not seem to strongly influence winning. However, it is interesting to note that their is a slight difference in average age of winning and losing players between the genders. If we take a look at the average age of winning player 2 in Figure 3.4, we can see that the average age of winning player 2 is less for females than males. Similarly, if we consider the average age of losing player 1 in Figure 3.5, we can see that the average age is also less for females than it is for males.

Figure 3.2: Ages of Male Winning Player 1

Figure 3.3: Ages of Male Losing Player 2

Ages of Winning Player 2 by gender

Figure 3.4: Ages of Winning Player 2 by gender

Ages of Losing Player 1 by gender

Figure 3.5: Ages of Losing Player 1 by gender

3.3 What is the average height of a winning team and how does it compare with a losing team?

N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.

The average height for female winning players 1 and 2 are 70.91 and 70.85 inches respectively. The average height for female losing players 1 and 2 are 70.62 and 70.72 inches respectively. Although the average height for the losing players is less than the height of winning players, it is not a huge difference.

The average height for male winning players 1 and 2 are 76.28 and 76.39 inches respectively, compared to the height for losing players 1 and 2 of 75.98 and 76.15 inches respectively. Consider Figures 3.6 and 3.7, which display the difference in heights between male winning and losing players 1 (Fig. 3.6) and male winning and losing player 2 (Fig. 3.7). In both situations, the means in difference in height are pretty evenly centred around 0. so we probably can’t say height difference effects winning a volleyball game. We can however say that male volleyball participants are generally taller than female volleyball participants although through common sense we know this phenomenon is not unique to just volleyball.

Difference in Heights of Male Player 1

Figure 3.6: Difference in Heights of Male Player 1

Difference in Heights of Male Player 2

Figure 3.7: Difference in Heights of Male Player 2

3.4 Looking into the FVIB circuit, is there any home advantage for winning team?

In team sports, the term home advantage describes the benefit that the home team is said to gain over the visiting team. This is because the home team will be more adaptable to the weather, temperature and other natural factors in the competition area. Additionally, there will be no jet lag problem, and there will also be a sense of security on the psychological level. Therefore, home advantage is a frequently mentioned topic in sports competitions. This time is no exception, we will also be curious whether the winning team of beach volleyball will have home advantage, which will be analyzed as followed.

Firstly, since the host country and contestant’s country in AVP competition is almost the United States, it is meaningless to discuss this issue, so I only choose the data of FVIB competition as the object. Then let’s see the home winning rate regardless of gender. I select the observations that has country where tournament played equal the country where winner is from. And then compute the number of these matches and save it as variable “num_winner”. After that, I compute the total number of matched host in every country each year and save it as variable “num_total”. Next, I join these two tables together to calculate the winning rate. Finally, I divide “num_winner” by “num_total” to get the winning rate regardless gender.

After getting the results, I make a bar plot to show this with a descending order.

Looking into Figure 3.8, it can be observed that in all the eight years, although team of United States has the highest winning rate at 35.678392%, all the winning rate at home is less than fifty percentatge, that is to say, the winning rate at home is not higher than that at away, which shows that the home advantage is not obvious in FVIB competition.

Home winning rate for all team

Figure 3.8: Home winning rate for all team

Although the home court advantage is not obvious for the winning team in general, is there any difference between the winning teams of different genders?

Then we comes to women’s team. On the basis of the previous part of the method, I added gender screening with screening the teams only for female gender, and calculated the home winning rate, then displayed the results in Figure 3.9. It can be seen that with the United States having the highest rate at 39.5604396%, all of the teams don’t have rates over fifty percent, quite same as the general situation. That indicates that home advantage is still not obvious in women’s team.

Figure 3.9: Home winning rate for Woman

Using the same method of screening women’s teams, the teams whose gender is only male are selected and the winning rate at home is calculated after that. The results are shown in Figure 3.10. It is quite interesting that the home advantage is also not obvious in men, even if the highest rate is reached 32.4074074% by United States.

Figure 3.10: Home winning rate for Man

General speaking, home advantage is not tenable for the winning team in FVIB tournament, regardless of gender. However, if we look further, we can find that the winning rate of women at home is higher than that of men in most countries, with the exception of Poland, where the winning rate of men at home is 9.6774194% higher than that of women’s 4.2553191%.

In addition, we can also see that no men’s team in England and Greece has ever won, which may indicate that the strength of men’s beach volleyball in these two countries is not enough or is not attached much attention to. But in any case, the United States has the most home winning rate, that is because it has the most winning teams as explained previously.

3.5 Is there any low ranking team beat higher ranking team?

Through the analysis, we can know that the winning team may be a strong team, but is there any situation that the low ranking team defeats the higher one? First, I focus on FVIB tournament, and filter the matches in which the low ranking team defeated the higher one for women. And then compute the number of these matches and save it as variable “num_rank” . After that, I compute the total number of matched host in every country each year and save it as variable “num_total”. Next, I join these two tables together to calculate the proportion of teams with low ranking but defeating the higher one. Finally, I divide “num_rank” by “num_total” to get the results.Second, I use the same method as above to compute the rate for men.

In Figure 3.11, it can be seen that tournament in Italy has the most proportion of team defeating teams higher than them at about 63.6363636%, and the lowest tournament is in England at 47.9166667%. Most of the tournaments have over half teams defeating the higher ranked teams.

Figure 3.11: Low ranking team beat higher ranking team(Woman FVIB)

Figure 3.12 shows the results of men. The tournament in Greece had about 63.4615385% teams defeating higher ranked one. The only two that was not over fifty percent are Brazil and China with proportion at 44.4444444% and 47.9166667%. Similar to women, most tournament for men had over half team defeating higher ranked one.

However, we could also recognize that this proportion of women is greater than that of men for most countries, which indicates that the overall strength of the women’s team is stronger than that of the men’s team. But there is an outlier - England, in which the men performed better than women with a higher proportion at 50%. This may indicate that men’s beach volleyball is stronger in the UK.

Figure 3.12: Low ranking team beat higher ranking team(Man FVIB)

Then, it comes to the AVP tournament. I use the same method applied in the analysis of FVIB. But in order to display the women and men rate in one plot, I manually create a tibble that only contains the gender and the rate value. As the host country in AVP is all America, I ignore the country variable. I also draw a plot to represent the results.

In Figure 3.13, the bar chart on the left side is the rate of men with the right-side one showing the rate of women team. It can be observed that the rate of men is about 50.25072%, and the rate of women is around 51.14613%. We can see that in the AVP competition, the gender difference is not obvious. Similarly, more than half of the teams can beat the higher ranked teams.

Figure 3.13: Low ranking team beat higher ranking team (AVP)

Generally speaking, the situation that the lower ranked teams beat the higher ranked teams accounts for more than half of the total in both FVIB and AVP tournaments. Therefore, we can say that it is not uncommon for low ranking teams to beat high ranked ones in beach volleyball. It also indirectly indicates that ranking in beach volleyball competition may not fully reflect the strength and winning rate of a team.

4 Conclusion

After our analysis, we have concluded that a typical winning male volleyball team most likely has both players originating from the United States, with player one having an average age of 29.40 and an average height of 76.28 inches with player two having an average age of 29.32 and an average height of 76.39 inches.

In addition, a typical winning female volleyball team most likely has both players originating from the United States, with player one having an average age of 27.98 and an average height of 70.91 inches with player two having an average age of 28.29 and an average height of 70.85 inches.

5 Acknowledgements

Thanks for the contributors of these packages:

6 References

Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.

Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

Zhu, H. (2019). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.1.0. https://CRAN.R-project.org/package=kableExtra

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

BigTimeStats. n.d. “BigTimeStats/Beach-Volleyball.” GitHub. https://github.com/BigTimeStats/beach-volleyball.

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.